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Abstract.  In  this  paper,  we  introduce  a  Foundational  Proof-Carrying  Code  (FPCC) 
framework  for  constructing  certified  code  packages  from  typed  assembly  lan¬ 
guage  that  will  interface  with  a  similarly  certified  runtime  system.  Our  frame¬ 
work  permits  the  typed  assembly  language  to  have  a  “foreign  function”  interface, 
in  which  stubs,  initially  provided  when  the  program  is  being  written,  are  eventu¬ 
ally  compiled  and  linked  to  code  that  may  have  been  written  in  a  language  with 
a  different  type  system,  or  even  certified  directly  in  the  FPCC  logic  using  a  proof 
assistant.  We  have  increased  the  potential  scalability  and  flexibility  of  our  FPCC 
system  by  providing  a  way  to  integrate  programs  compiled  from  different  source 
type  systems.  In  the  process,  we  are  explicitly  manipulating  the  interface  between 
Hoare  logic  and  a  syntactic  type  system. 


1  Introduction 

Proof-Carrying  Code  (PCC)  [16, 17]  is  a  framework  for  generating  executable  machine 
code  along  with  a  machine-checkable  proof  that  the  code  satisfi  es  a  given  safety  pol¬ 
icy.  The  initial  PCC  systems  specifi  ed  the  safety  policy  using  a  logic  extended  with 
many  (source)  language-specifi  c  rules.  While  allowing  implementation  of  a  scalable 
system  [18, 7],  this  approach  to  PCC  suffers  from  too  large  of  a  trusted  computing  base 
(TCB).  It  is  still  diffi  cult  to  trust  that  the  components  of  this  system  -  the  verifi  cation- 
condition  generator,  the  proof-checker,  and  even  the  logical  axioms  and  typing  rules  - 
are  free  from  error. 

The  development  of  another  family  of  PCC  implementations,  known  as  Founda¬ 
tional  Proof-Carrying  Code  (FPCC)  [4, 3],  was  intended  to  reduce  the  TCB  to  a  min¬ 
imum  by  expressing  and  proving  safety  using  only  a  foundational  mathematical  logic 
without  additional  language-specifi  c  axioms  or  typing  rules.  The  trusted  components  in 
such  a  system  are  mostly  reduced  to  a  much  simpler  logic  and  the  proof-checker  for  it. 

Both  these  approaches  to  PCC  have  one  feature  in  common,  which  is  that  they 
have  focused  on  a  single  source  language  ( e.g .  Java  or  ML)  and  compile  (type-correct) 
programs  from  that  language  into  machine  code  with  a  safety  proof.  However,  the  run¬ 
time  systems  of  these  frameworks  still  include  components  that  are  not  addressed  in 
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the  safety  proof  [3, 10]  and  that  are  written  in  a  lower-level  language  (like  C):  mem¬ 
ory  management  libraries,  garbage  collection,  debuggers,  marshallers,  etc.  The  issue  of 
producing  a  safety  proof  for  code  that  is  compiled  and  linked  together  from  two  or  more 
different  source  languages  was  not  addressed. 

In  this  paper,  we  introduce  an  FPCC  framework  for  constructing  certifi  ed  machine 
code  packages  from  typed  assembly  language  (TAL)  that  will  interface  with  a  similarly 
certifi  ed  runtime  system.  Our  framework  permits  the  typed  assembly  language  to  have 
a  “foreign  function”  interface  in  which  stubs,  initially  provided  when  the  program  is 
being  written,  are  eventually  compiled  and  linked  to  code  that  may  have  been  writ¬ 
ten  in  a  language  with  a  different  type  system,  or  even  certifi  ed  directly  in  the  FPCC 
logic  using  a  proof  assistant.  To  our  knowledge,  this  is  the  fi  rst  account  of  combining 
such  certifi  cation  proofs  from  languages  at  different  levels  of  abstraction.  While  type 
systems  such  as  TAL  facilitate  reasoning  about  many  programs,  they  are  not  suffi  cient 
for  certifying  the  most  low-level  system  libraries.  Hoare  logic-style  reasoning,  on  the 
other  hand,  can  handle  low-level  details  very  well  but  cannot  account  for  embedded 
code  pointers  in  data  structures,  a  feature  common  to  higher-order  and  object-oriented 
programming.  We  outline  for  the  fi  rst  time  a  way  to  allow  both  methods  of  verifi  cation 
to  interact,  gaining  the  advantages  of  both  and  circumventing  their  shortcomings. 

Experience  has  shown  that  foundational  proofs  are  much  harder  to  construct  than 
those  in  a  logic  extended  with  type-specifi  c  axioms.  The  earliest  FPCC  systems  built 
proofs  by  constructing  sophisticated  semantic  models  of  types  in  order  to  reason  about 
safety  at  the  machine  level.  That  is,  the  fi  nal  safety  proof  incorporated  no  concept  of 
source  level  types  -  each  type  in  the  source  language  would  be  interpreted  as  a  predicate 
on  the  machine  state  and  the  typing  rules  of  the  language  would  turn  into  lemmas  which 
must  prove  properties  about  the  interaction  of  these  predicates.  While  it  seems  that 
this  method  of  FPCC  would  already  be  amenable  to  achieving  the  goals  outlined  in 
the  previous  paragraph,  the  situation  is  complicated  by  the  complexity  of  the  semantic 
models  [1 1, 5, 1]  that  were  required  to  support  a  realistic  type  system.  Nonetheless,  the 
overall  framework  of  this  paper  may  work  equally  well  with  the  semantic  approach. 

In  this  paper,  we  adopt  the  “syntactic”  approach  to  FPCC,  introduced  in  [13, 14]  and 
further  applied  to  a  more  realistic  source  type  system  by  [9, 10].  In  this  framework,  the 
machine  level  proofs  do  indeed  incorporate  and  use  the  syntactic  encoding  of  elements 
of  the  source  type  system  to  derive  safety.  Previous  presentations  of  the  syntactic  ap¬ 
proach  involve  a  monolithic  translation  from  type-correct  source  programs  to  a  package 
of  certifi  ed  machine  code.  In  this  paper,  we  refi  ne  the  approach  by  inserting  a  generic 
layer  of  reasoning  above  the  machine  code  which  can  (1 )  be  a  target  for  the  compilation 
of  typed  assembly  languages,  (2)  certify  low-level  runtime  system  components  using 
assertions  as  in  Hoare  logic,  and  ( 3 )  “glue”  together  these  pieces  by  reasoning  about  the 
compatibility  of  the  interfaces  specifi  ed  by  the  various  types  of  source  code. 

A  simple  diagram  of  our  framework  is  given  in  Figure  1 .  Source  programs  are  writ¬ 
ten  in  a  typed  high-level  language  and  then  passed  through  a  certifying  compiler  to 
produce  machine  code  along  with  a  proof  of  safety.  The  source  level  type  system  may 
provide  a  set  of  functionality  that  is  accessed  through  a  library  interface.  At  the  machine 
level,  there  is  an  actual  library  code  implementation  that  should  satisfy  that  interface. 
The  non-trivial  problem  is  how  to  design  the  framework  such  that  not  only  will  the  two 


High-level  Type  System 


pieces  of  machine  code  link  together  to  run,  but  that  the  safety  proofs  originating  from 
two  different  sources  are  also  able  to  “link”  together,  consistent  with  the  high-level 
interface  specifi  cation,  to  produce  a  unifi  ed  safety  proof  for  the  entire  set  of  code. 

Notice  that  the  interaction  between  program  and  library  is  two-way:  either  piece  of 
code  may  make  direct  or  indirect  function  calls  and  returns  to  the  other.  Ideally,  we  want 
to  be  able  to  certify  the  library  code  with  no  knowledge  of  the  source  language  and  type 
system  that  will  be  interacting  with  it.  At  the  same  time  we  would  like  to  support  fi  rst- 
class  code  pointers  at  all  levels  of  the  code.  Methods  for  handling  code  pointers  properly 
have  been  one  of  the  main  challenges  of  FPCC  and  are  one  of  the  differentiating  factors 
between  semantic  and  syntactic  FPCC  approaches.  For  the  framework  in  this  paper,  we 
have  factored  out  most  of  the  code  pointer  reasoning  that  is  needed  when  certifying 
library  code  so  that  the  proofs  thereof  can  be  relatively  straightforward. 

In  the  following  sections,  after  defi  ning  our  machine  and  logic,  we  present  the  layer 
of  reasoning  which  will  serve  as  the  common  interface  for  code  compiled  from  different 
sources.  Then  we  present  a  typical  typed  assembly  language,  extended  with  library 
interfaces  and  external  call  facilities.  We  fi  nally  show  how  to  compile  this  language  to 
the  target  machine,  expanding  external  function  stubs,  and  linking  in  the  runtime  library, 
at  the  same  time  producing  the  proof  of  safety  of  the  complete  package.  We  conclude 
with  a  brief  discussion  of  implementation  in  the  Coq  proof  assistant  and  future  and 
related  work. 

2  A  Machine  and  Logic  for  Certified  Code 

In  this  section,  we  present  our  machine  on  which  programs  will  run  and  the  logic  that 
we  use  to  reason  about  safety  of  the  code  being  run.  We  use  an  idealized  machine  for 
purposes  of  presentation  in  this  paper  although  implementation  upon  the  IA-32  (Intel 
x86  architecture)  is  in  progress.  A  “real”  machine  introduces  many  engineering  details 
( e.g .  fi  xed-size  integers,  addressing  modes,  memory  model,  variable  length  instructions 
and  relative  addressing)  which  we  would  rather  avoid  while  presenting  our  central  con¬ 
tributions  along  the  subject  of  this  paper. 


Word  3  w,  i,pc  ::  =  0  |  1  |  . . . 

Regt  3  r  ::=  rO  |  rl  |  .  . .  |  rl5 

Cmd  3  c  ::=  add  rd,  ra,rt  |  addi  rd,  ra,i  |  mov  rd,  ra  |  movi  rd,  i 

|  bgt  ra,rt,w  |  bgti  ra,i,w  |  Id  rd,rs(«)  |  st  rd(«),r. 
|  jdw  |  jmpr  |  illegal 

M  6  Mem  —  Word  -3  Word 
R  £  FFi7e  =  Regt  -3  Word 
S  6  State  —  Mem  x  FF/fe  x  Wferd 

Fig.  2.  Machine  state:  memory,  registers,  and  instructions  (commands). 


if  Dc(M(pc))  = 

then  Step(M,  R,  pc)  = 

add  ra,  rs,rt 

(M,  R{r<(  i-t-  R(rs)  +R(rt)},pc  +  1) 

addi  rd,rs,i 

(M,  R {rd  i-»  R (rs)  +  i},pc  + 1) 

mov  rd,rs 

(M,  R{rd  i->-  rs},pc  + 1) 

movi  Td, ,  i 

(M,  R{rd  i-+  i},pc  + 1) 

Id  rd,rs  ( i ) 

(M,  R{rd  i-»  M(R(r\s )  +  *)},pc  +  l) 

st  rd(i),rs 

(M{R(rd)  +  i  R(r\s)},  R,  pc  +  1) 

bgt  rs,rt,w 

(M,  R,  pc  +  1)  when  R(rs)  <  R(n)  and  (M,  R,  w)  when  R(rs)  >  R(r*) 

bgti  rs,i,w 

(M,  R,  pc  + 1)  when  R(rs)  <  i  and  (M,  R,  w)  when  R (rs)  >  i 

jdw 

(M,  R,  w) 

jmp  r 

(M,  R,  R(r)) 

illegal 

(M,  R,  pc) 

Fig.  3.  Machine  semantics. 


2.1  The  Machine 

The  hardware  components  of  our  idealized  machine  are  a  memory,  register  fi  le,  and  a 
special  register  containing  the  current  program  counter  (pc).  These  are  deli  ned  to  be 
the  machine  state,  as  shown  in  Figure  2.  We  use  a  16-register  word-addressed  machine 
with  an  unbounded  memory  of  unlimited-size  words.  We  also  defi  ne  a  decoding  func¬ 
tion  Dc  which  decodes  integer  words  into  a  structured  representation  of  instructions 
(“commands”),  also  shown  in  Figure  2.  The  machine  is  thus  equipped  with  a  Step  func¬ 
tion  that  describes  the  (deterministic)  transition  from  one  machine  state  to  the  next, 
depending  on  the  instruction  at  the  current  pc. 

The  operational  semantics  of  the  machine  is  given  in  Figure  3.  The  instructions’  ef¬ 
fects  are  quite  intuitive.  The  fi  rst  half  involve  arithmetic  and  data  movement  in  registers. 
The  Id  and  st  load  and  store  data  from/to  memory.  These  are  followed  by  the  condi¬ 
tional  and  unconditional  branch  instructions.  An  illegal  (non-decodable)  instruction 
puts  the  machine  in  an  infi  nite  loop. 

2.2  The  Logic 

In  order  to  produce  FPCC  packages,  we  need  a  logic  in  which  we  can  express  (encode) 
the  operational  semantics  of  the  machine  as  well  as  defi  ne  the  concept  and  criteria  of 
safety.  A  code  producer  must  then  provide  a  code  executable  (initial  machine  state) 


along  with  a  proof  that  the  initial  state  and  all  future  transitions  therefrom  satisfy  the 
safety  condition. 

The  foundational  logic  we  use  is  the  calculus  of  inductive  constructions  (CiC)  [24, 
20].  CiC  is  an  extension  of  the  calculus  of  constructions  (CC)  [8],  which  is  a  higher- 
order  typed  lambda  calculus.  Due  to  limited  space  we  forgo  a  discussion  of  CiC  here 
and  refer  the  reader  unfamiliar  with  the  system  to  the  cited  references. 

CiC  has  been  shown  to  be  strongly  normalizing  [25],  hence  the  corresponding  logic 
is  consistent.  It  is  supported  by  the  Coq  proof  assistant  [24],  which  we  use  to  implement 
a  prototype  system  of  the  results  presented  in  this  paper. 

2.3  Defining  Safety  and  Generating  Proofs 

The  safety  condition  is  a  predicate  expressing  the  fact  that  code  will  not  “go  wrong.” 
We  say  that  a  machine  state  S  is  safe  if  every  state  it  can  ever  reach  satisfi  es  the  safety 

P°llLy  SP'  Safe  (S,  SP)  =  n« : Nat.  SP  (Step"  (§)) 

A  typical  safety  policy  may  require  such  things  as  the  program  counter  must  point 
to  a  valid  instruction  address  in  the  code  area  and  that  any  writes  (reads)  to  (from) 
memory  must  be  from  a  properly  accessible  area  of  the  data  space.  For  the  purposes  of 
presentation  in  this  paper,  we  will  be  using  a  very  simple  safety  policy,  requiring  only 
that  the  machine  is  always  at  a  valid  instruction: 

BasicSP  (M,  R,pc)  =  Dc(M(pc))  ^  illegal  A  lnCodeArea(M,pc) 

We  can  easily  deft  ne  access  controls  on  memory  reads  and  writes  by  including  an¬ 
other  predicate  in  the  safety  policy,  SafeRdWr(M,  R,pc).  By  reasoning  over  the  num¬ 
ber  of  steps  of  computation  more  complex  safety  policies  including  temporal  constraints 
can  potentially  be  expressed.  However,  we  will  not  be  dealing  with  such  policies  here. 

The  FPCC  code  producer  has  to  provide  an  encoding1  of  the  initial  state  So  along 
with  a  proof  A  that  this  state  satisfi  es  the  safety  condition  BasicSP,  specifi  ed  by  the 
code  consumer.  The  fi  nal  FPCC  package  is  thus  a  pair: 

F  =  (So  :  State ,  A  :  Safe  (So,  BasicSP)). 

3  A  Language  for  Certified  Machine  Code  (CAP) 

We  know  now  what  type  of  proof  we  are  looking  for;  the  hard  part  is  to  generate  that 
proof  of  safety.  Previous  approaches  for  FPCC  [4,2,5, 13]  have  achieved  this  by  con¬ 
structing  an  induction  hypothesis,  also  known  as  the  global  invariant,  which  can  be 
proven  (e.g.  by  induction)  to  hold  for  all  states  reachable  from  the  initial  state  and  is 
strong  enough  to  imply  the  safety  condition.  The  nature  of  the  invariant  has  ranged 
from  a  semantic  model  of  types  at  the  machine  level  (Appel  et  al.  [4, 2, 5, 23])  to  a 
purely  syntactic  well-formedness  property  [13, 14]  based  on  a  type-correct  source  pro¬ 
gram  in  a  typed  assembly  language. 

What  we  have  developed  in  this  paper  reft  nes  these  previous  approaches.  We  will 
still  be  presenting  a  typed  assembly  language  in  Section  4,  in  which  most  source  pro¬ 
grams  are  written.  However,  we  introduce  another  layer  between  the  source  type  system 

1  We  must  trust  that  our  encoding  of  the  machine  and  its  operational  semantics,  and  the  definition 
of  safety,  are  correct.  Along  with  the  logic  itself  and  the  proof-checker  implementation  thereof, 
these  make  up  most  of  our  software  trusted  computing  base  (TCB). 


and  the  “raw”  encoding  of  the  target  machine  in  the  FPCC  logic.  This  is  a  “type  system” 
or  “specifi  cation  system”  that  is  defi  ned  upon  the  machine  encoding,  allowing  us  to  rea¬ 
son  about  its  state  using  assertions  that  essentially  capture  Hoare  logic-style  reasoning. 
Such  a  layer  allows  more  generality  for  reasoning  than  a  fi  xed  type  system,  yet  at  the 
same  time  is  more  structured  than  reasoning  directly  in  the  logic  about  the  machine 
encoding. 

Our  language  is  called  CAP  and  it  uses  the  same  machine  syntax  as  presented  in 
Figure  2.  The  syntax  of  the  additional  assertion  layer  is  given  below: 

P.  Q.  T1.  (z  Pred  =  State  -3  Prop 

$  €  CdSpec  =  Word  — >-  ( Word  x  Pred) 

CmdList  3  C  ::=  0  |  c  ::  C 
WordList  3  W  ::=  0  |  w  ::  W 

The  name  CAP  is  derived  from  its  being  a  “Certifi  ed  Assembly  Programming”  lan¬ 
guage.  An  initial  version  was  introduced  in  [27]  and  used  to  certify  a  dynamic  storage 
allocation  library.  The  version  we  have  used  for  this  paper  introduces  some  minor  im¬ 
provements  such  as  a  unifi  ed  data  and  code  memory,  assertions  on  the  whole  machine 
state,  and  support  for  user-specifi  able  safety  policies  (Section  3.3). 

Assertions  (P,Q,R)  are  predicates  on  the  machine  state  and  the  code  specifi  cation 
($)  is  a  partial  function  mapping  memory  addresses  to  a  pair  of  an  integer  and  a  pred¬ 
icate.  The  integer  gives  the  length  of  the  command  sequence  at  that  address  and  the 
predicate  is  the  precondition  for  the  block  of  code.  (The  function  of  this  is  to  allow  us 
to  specify  the  addresses  of  valid  code  areas  of  memory  based  on  <I>.) 

The  operational  semantics  of  the  language  has  already  been  presented  in  Section  2.1. 
We  now  introduce  CAP  inference  rules  followed  by  some  important  safety  theorems. 

3.1  Inference  Rules 

CAP  adds  a  layer  of  inference  rules  (“typing  rules”)  allowing  us  to  prove  specifi  cation 
judgments  of  the  forms: 

$  P  {P}  C  well-formed  command  sequence 
F  M  :  $  well-formed  code  specification 
h  (M,  R,pc)  well-formed  machine  state 

The  inference  rules  for  these  judgments  are  shown  in  Figure  4.  The  rules  for  well- 
formed  command  sequences  essentially  require  that  if  the  given  precondition  P  is  satis- 
fi  ed  in  the  current  state,  there  must  be  some  postcondition  (),  which  is  the  precondition 
of  the  remaining  sequence  of  commands,  that  holds  on  the  state  after  executing  one 
step.  The  rules  directly  refer  to  the  Step  function  of  the  machine;  control  fbw  instruc¬ 
tions  additionally  use  the  code  specifi  cation  environment  $  in  order  to  allow  for  the 
certifi  cation  of  mutually  dependent  code  blocks. 

We  group  as  “pure”  commands  all  those  which  do  not  involve  control  fbw  and  do 
not  change  the  memory  (i.e.  everything  other  than  branches,  jumps,  and  st).  The  st 
command  requires  an  additional  proof  that  the  address  being  stored  to  is  not  in  the  code 
area  (i.e.  we  do  not  permit  self-modifying  code).  curcmd(S)  is  defi  ned  as: 


(CAP-PURE) 


c  (E  {add,  addi,  mov,  movi,  Id} 

VS.(P(S)  A  curcmd(S)  =  c)  -s-  Q(Step(S)))  $  H  {Q}  C 

_____ 


VS.(P(S)  A  curcmd(S)  =  st  rd(i),  r3) 
<t>  H {Q}C 


Q(Step(S)) 

A  -,lnCodeArea($,  S.l(rj)  +  iD 


I-  {P}  st  rj(i),rs  ::  C 


(CAP-ST) 


VS.(P(S)  A  curcmd(S)  =  bgt r3,  rt,  w) 

-s-  (S.H(r.)  <  S.H (ft)  -s-  Q(Step(S)»  A  (S.M(r.)  >  S.R(rt)  -¥  Q'(Step(S))) 
^  F  {Q}  C  where  <t>(tp)  =  (n,  Q' ) 

■$  h  {P}  bgt  r3,rt,  w  ::  C 


(CAP-BGT) 


VS.(P(S)  A  curcmd(S)=  jdw)  -s-  Q'(Step(S)) 
$  I-  {P}  jd  w  ::  0 


where  &(w)  =  ( n ,  Q' ) 
-  (CAP-JD) 


VS.(P(S)  A  curcmd(S)  =  jmpr)  -t  Q'(Step(S)))  where  <£(S  .M(r)J  =  (n,Q') 
h  {P}  jmp  r  ::  0 


(CAP-JMP) 


Flatten(W,  M, /)  <S>  h  {P}  (Map(Dc,  W)) 

for  all  f  where  <3>(f)  =  (length(W),  P) 

PmT5 


(CAP-CDSPEC) 


HM:4  $  I- {P}  (Map(Dc,  W)) 
Flatten(W,  M,  pc)  InCodeAreaf-I', pc) 

t-  (M,  R,pc) 


P(M,  R  ,pc) 

-  (CAP-STATE) 


Fig.  4.  CAP  inference  rules. 


curcmd(M,  M,pc)  =  Dc(M(pc)) 

The  InCodeArea  predicate  in  the  rules  uses  the  code  addresses  and  sequence  lengths 
in  $  to  determine  whether  a  given  address  lies  within  the  code  area.  The  (CAP-CDSPEC) 
rule  ensures  that  the  addresses  and  sequence  lengths  specifi  ed  in  $  are  consistent  with 
the  code  actually  in  memory. 

The  Flatten  predicate  is  defi  ned  as: 

Flatten(0,M,  /)  =  True 

Flatten(w  ::  W,  M,  /)  =  M {f)  =  w  A  Flatten(W,  M,  /+ 1) 

3.2  Safety  Properties 

The  machine  will  execute  continuously,  even  if  an  i  1  legal  instruction  is  encountered. 
Given  a  well-formed  CAP  state,  however,  we  can  prove  that  it  satisfi  es  our  basic  safety 
policy,  and  that  executing  the  machine  one  step  will  result  again  in  a  good  CAP  state. 

Theorem  1  (Safety  Policy  and  Preservation). 

For  some  state  S,  if  P  S  then  (1)  BasicSP(S)  and  (2)  F  Step” (§) /or  a// n. 

For  the  purposes  of  FPCC,  we  are  interested  in  obtaining  safety  proofs  in  the  context 
of  our  policy  as  described  in  Section  2.3.  From  Theorem  1  we  can  easily  derive: 

Theorem  2  (CAP  Safety).  For  any  S,  if  P  S  then  Safe(S ,  BasicSP). 


Thus,  to  produce  an  FPCC  package  we  just  need  to  prove  that  the  initial  machine 
state  is  well-formed  with  respect  to  the  CAP  inference  rules.  This  provides  a  structured 
method  for  constructing  FPCC  packages  in  our  logic.  However,  programming  and  rea¬ 
soning  in  CAP  is  still  much  too  low-level  for  the  practical  programmer.  We  thus  need  to 
provide  a  method  for  compiling  programs  from  a  higher-level  language  and  type  system 
to  CAP.  The  main  purpose  of  programming  directly  in  CAP  will  then  be  to  “glue”  code 
together  from  different  source  languages  and  to  certify  particularly  low-level  libraries 
such  as  memory  management.  In  the  next  few  sections,  we  present  a  “conventional” 
typed  assembly  language  and  show  how  to  compile  it  to  CAP. 

3.3  Advanced  safety  policies 

In  the  theorems  above,  and  for  the  rest  of  this  paper,  we  are  only  interested  in  proving 
safety  according  to  our  basic  safety  policy.  For  handling  more  general  safety  policies 
using  CAP,  we  can  extend  our  CAP  inference  rules  by  parameterizing  them  with  a 
“global  safety  predicate”  SP:  $  FSP  {P}C,  Fsp  M  :  $,  and  hsp  (M,  K,pc). 

The  inference  rule  for  each  command  in  this  extended  system  requires  an  addi¬ 
tional  premise  that  the  precondition  for  the  command  implies  the  global  safety  predi¬ 
cate.  Then,  using  a  generalized  version  of  Theorem  1,  we  can  establish  that: 

Theorem  3.  For  any  Sand  SP,  if  LSP  S  then  Safe(S ,  \§':State.  SP(S')  A  BasicSP(S')). 

Threading  an  arbitrary  SP  through  the  typing  rules  is  a  novel  feature  not  found 
in  the  initial  version  of  CAP  [27].  In  that  case,  there  was  no  way  to  specify  that  an 
arbitrary  safety  policy  beyond  BasicSP  (which  essentially  provides  type  safety)  must 
hold  at  every  step  of  execution. 

4  Extensible  Typed  Assembly  Language  with  Runtime  System 

In  this  section,  we  introduce  an  extensible  typed  assembly  language  (XTAL)  based  on 
that  of  Morrisett  el  al.  [15].  After  presenting  the  full  syntax  of  XTAL,  we  give  here 
only  a  brief  overview  of  its  static  and  dynamic  semantics,  due  to  space  constraints 
of  this  paper.  A  more  complete  deli  nition  of  the  language  can  be  found  in  the  Coq 
implementation  itself  or  the  technical  report  [12], 

4.1  Syntax 

To  simplify  the  presentation,  we  will  use  a  much  scaled  down  version  of  typed  assem¬ 
bly  language  (see  Figure  5)— its  types  involve  only  integers,  pairs,  and  integer  arrays. 
(We  have  extended  our  prototype  implementation  to  include  existential,  recursive,  and 
polymorphic  code  types.)  The  code  type  V[r]  describes  a  code  pointer  that  expects  a 
register  fi  le  satisfying  T.  The  register  fi  le  type  assigns  types  to  the  word  values  in  each 
register  and  the  heap  type  keeps  track  of  the  heap  values  in  the  data  heap.  We  have 
separated  the  code  and  data  heaps  at  this  level  of  abstraction  because  the  code  heap  will 
remain  the  same  throughout  the  execution  of  a  program. 

Unlike  many  conventional  TALs,  our  language  supports  “stub  values”  in  its  code 
heap.  These  are  placeholders  for  code  that  will  be  linked  in  later  from  another  source 
(outside  the  XTAL  system).  Primitive  “macro”  instructions  that  might  be  built  into  other 
TALs,  such  as  array  creation  and  access  operations,  can  be  provided  as  an  external 


(type) 

r  : 

:=  int  |  array  |  t0  x  |  V[r] 

(regfile  type) 

r  : 

:=  {r0:r0,...,rn:r„} 

(heap  type) 

4-  : 

:=  {Zo  :  T0,  ■  ■  ■  fin  '■  Tn  } 

(label) 

l  : 

==0  |  1  |  ... 

( register) 

r 

:=  rO  1  rl  I ...  I  r7 

(word  val) 

V 

:—l\i 

( code  heap  val 

h  : 

■=  code  [T]./  stub  [r],0 

(heap  val) 

h  : 

:=  [*o,  ...,*«]  ( vo ,  vi) 

( instr ) 

t 

:=add  rd,rs,rt  \  movi  rd,  i  \  movl  rd,l  \  Id  rd,rs(i) 

|  st  rd(i),rs  |  bgt  rs,rt,l  \  bgti  \  newpair  rd[r0,  tx 

(instr  seq) 

I 

:=  t;  /  jd  l  |  jmp  r 

(code  heap) 

C 

—  {Zo  l— ^  Zlo,  •  •  •  ,  In  l— ^  Zl,i} 

(data  heap) 

H 

■—  {Zo  ho,  ■■■  ,ln  hn} 

(regfile) 

R  : 

:=  {ro  i-+  vo,  ■  ■  ■  ,rn  i->-  vn} 

( program  ) 

v  ■■ 

: ={C,H,R,I ) 

Fig.  5.  XTAL  syntax. 

library  with  interface  specifi  ed  as  XTAL  types.  We  have  also  included  a  typical  macro 
instruction  for  allocating  pairs  (newpair)  in  the  language.  When  polymorphic  types  are 
added  to  the  language,  this  macro  instruction  could  potentially  be  provided  through  the 
external  code  interface;  however,  in  general,  providing  built-in  primitives  can  allow  for 
a  richer  specifi  cation  of  the  interface  (see  the  typing  rule  for  newpair  below). 

The  abstract  state  of  an  XTAL  program  is  composed  of  code  and  data  heaps,  a  reg¬ 
ister  fi  le,  and  current  instruction  sequence.  Labels  are  simply  integers  and  the  domains 
of  the  code  and  data  heaps  are  to  be  disjoint.  Besides  the  newpair  operation,  the  arith¬ 
metic,  memory  access,  and  control  fbw  instructions  of  XTAL  correspond  directly  to 
those  of  the  machine  deli  ned  in  2.1.  The  movl  instruction  is  constrained  to  refer  only 
to  code  heap  labels.  Note  that  programs  are  written  in  continuation  passing  style;  thus 
every  code  block  ends  with  some  form  of  jump  to  another  location  in  the  code  heap. 


4.2  Static  and  Dynamic  Semantics 


The  dynamic  (operational)  semantics  of  the  XTAL  abstract  machine  is  deli  ned  by  a  set 
of  rules  of  the  form  V  t— >  V' ■  This  evaluation  relation  is  entirely  standard  (see  [15, 14]) 
except  that  the  case  when  jumping  to  a  stub  value  in  the  code  heap  is  not  handled.  The 
complete  rules  are  omitted  here. 

For  the  static  semantics,  we  deli  ne  a  set  of  judgments  as  illustrated  in  Figure  6.  Only 
a  few  of  the  critical  XTAL  typing  rules  are  presented  here.  The  top-level  typing  rule  for 
XTAL  programs  requires  well-formedness  of  the  code  and  data  heaps,  register  fi  le,  and 
current  instruction  sequence,  and  that  I  is  somewhere  in  the  code  heap: 


VC  V  H:\P  V  R:T  C;T  V  I 

31  G  Dom(C).  C(l)  =  code  [r']./'  and  I  CMiI  I1 

V(C,H,R,I) 


(PROG) 


Judgment 

Meaning 

P  To  C  IT 
h  (C,H,R,I) 
PC 

hH:V 

P  R:T 

C  \~h  cdval 

4'  Pft  :rhval 
tk;  P  v  :  r 

C;  T  P7 

To  is  a  register  file  subtype  of  IT 
(C,  H,  R ,  I)  is  a  well-formed  program 

C  is  a  well-formed  code  heap 

H  is  a  well-formed  data  heap  of  type  tE1 

R  is  a  well-formed  reg.  file  of  type  T 
ft  is  a  well-formed  code  heap  value 
ft  is  a  well-formed  data  heap  value  of  type  r 
v  is  a  well-formed  word  value  of  type  r 
/  is  a  well-formed  instruction  sequence 

Fig.  6.  Static  judgments. 


Heap  and  register  fi  le  typing  depends  on  the  well-formedness  of  the  elements  in 
each.  Stub  values  are  simply  assumed  to  have  the  specifi  ed  code  type.  From  the  in¬ 
struction  typing  rules,  we  show  below  the  rules  for  newpair,  jd,  and  jmp.  The  newpair 
instruction  expects  initialization  values  for  the  newly  allocated  space  in  registers  rO  and 
rl  and  a  pointer  to  the  new  pair  is  put  in  r(/. 


C;T  PJ 

- - — -  (CODE) 

C  Fcode  [T].I  cdval 


C  Fstub  [r] .0  cdval 


(stub) 


F(rO)  =  t0 


r(rl)  =  7j  C;r{rd:r0  xtJFJ 
C;T  P  newpair  rd[T0,\\,I 


(IS -NEWPAIR) 


typeof(C(0)  =  v[r']  h  r  c  r' 
C;  T  Pjd  l 


(IS-JD) 


r(r)  =  v[r']  hrcr' 

C;  T  Fjmp  r 


(IS-JMP) 


Although  the  details  of  the  type  system  are  certainly  important,  the  key  thing  to 
be  understood  here  is  just  that  we  are  able  to  encode  the  syntactic  judgment  forms  of 
XTAL  in  our  logic  and  prove  soundness  in  Wright-Felleisen  style  [26].  We  will  then 
refer  to  these  judgments  in  CAP  assertions  during  the  process  of  proving  machine  code 
safety. 


4.3  External  Code  Stub  Interfaces 

XTAL  can  pass  around  pointers  to  arrays  in  its  data  heap  but  has  no  built-in  operations 
for  allocating,  accessing,  or  modifying  arrays.  We  provide  these  through  code  stubs: 

newarray  i->- stub  [{  rO:  int,  rl : int,  r7:(V[{rO: array}])  }].0 
arraygett-A  stub  [{  rO: array,  rl : int,  r7:(V[{rO:int}])  }].0 
arrayseti-A  stub  [{  rO: array,  rl : int,  r2 : int,  r7 :(V[{rO: array}])  }].0 

newarray  expects  a  length  and  initial  value  as  arguments,  allocates  and  initializes  a 
new  array  accordingly,  and  then  jumps  to  the  code  pointer  in  r7.  The  accessor  operations 
similarly  expect  an  array  and  index  arguments  and  will  return  to  the  continuation  pointer 
in  r7  when  they  have  performed  the  operation.  As  is  usually  the  case  when  dealing  with 
external  libraries,  the  interfaces  (code  types)  defi  ned  above  do  not  provide  a  complete 


specifi  cation  of  the  operations  (such  as  bounds-checking  issues).  Section  5.3  discusses 
how  we  deal  with  this  in  the  context  of  the  safety  of  XTAL  programs  and  the  fi  nal 
executable  machine  code. 

4.4  Soundness 

As  usual,  we  need  to  show  that  our  XTAL  type  system  is  sound  with  respect  to  the 
operational  semantics  of  the  abstract  machine.  This  can  be  done  using  the  standard 
progress  and  preservation  lemmas.  However,  in  the  presence  of  code  stubs,  the  complete 
semantics  of  a  program  is  undefi  ned,  so  at  this  level  of  abstraction  we  can  only  assume 
that  those  typing  rules  are  sound.  In  the  next  section,  when  compiling  XTAL  programs 
to  the  real  machine  and  linking  in  code  for  these  libraries  and  stubs,  we  will  need  to 
prove  at  that  point  that  the  linked  code  is  sound  with  respect  to  the  XTAL  typing  rules. 
Let  us  defi  ne  the  state  when  the  current  XTAL  program  is  jumping  to  external  code: 

Definition  1  (External  call  state).  We  define  the  current  instruction  of  a  program, 
(C,H,R,I),  to  be  an  external  call  if  I  £  {jd  l,  jmp  r,  bgt. . . ,  bgti . . .}  and  C(l)  = 
stub  [T] .0  orC(R(r))  =  stub  [T].0,  as  appropriate. 

Theorem  4  (XTAL  Progress).  If  V  V  and  the  current  instruction  ofV  is  not  an  exter¬ 
nal  call  then  there  exists  V  such  that  V  t-»  V' . 

Theorem  5  (XTAL  Preservation).  If  h  V  and  V  i-»  V  then  F  V . 

These  theorems  are  proven  by  induction  on  the  well-formed  instruction  premise 
(C;  T  FI)  of  the  top  level  typing  rule  (h  V).  Of  course  the  proof  of  these  must  be  done 
entirely  in  the  FPCC  logic  in  which  the  XTAL  language  is  encoded. 

In  our  previous  work  [13, 14],  we  demonstrated  how  to  get  from  these  proofs  of 
soundness  directly  to  the  FPCC  safety  proof.  However,  now  we  have  an  extra  level  to 
go  through  (the  CAP  system)  in  which  we  will  also  be  linking  external  code  to  XTAL 
programs,  and  we  must  ensure  safety  of  the  complete  package  at  the  end. 

5  Compilation  and  Linking 

In  this  section  we  fi  rst  defi  ne  how  abstract  XTAL  programs  will  be  translated  to,  and  laid 
out  in,  the  real  machine  state  (the  runtime  memory  layout).  We  also  defi  ne  the  necessary 
library  routines  as  CAP  code  (the  runtime  system).  Then,  after  compiling  and  linking 
an  XTAL  program  to  CAP,  we  must  show  how  to  maintain  the  well-formedness  of  that 
CAP  state  so  that  we  can  apply  Theorem  2  to  obtain  the  fi  nal  FPCC  proof  of  safety. 

5.1  The  Runtime  System 

In  our  simple  runtime  system,  memory  is  divided  into  three  sections-a  static  data  area 
(used  for  global  constants  and  library  data  structures),  a  read-only  code  area  (which 
might  be  further  divided  into  subareas  for  external  (£)  and  program  code),  and  the 
dynamic  heap  area,  which  can  grow  indefi  nitely  in  our  idealized  machine.  We  use  a  data 
allocation  framework  where  a  heap  limit,  stored  in  a  fi  xed  allocation  pointer  register,2 
designates  a  fi  nite  portion  of  the  dynamic  heap  area  as  having  been  allocated  for  use. 
(Our  safety  policy  could  use  this  to  specify  “readable”  and  “writeable”  memory.) 

2  XTAL  source  programs  use  fewer  registers  than  the  actual  machine  provides. 


5.2  Translating  XTAL  Programs  to  CAP 

We  now  outline  how  to  construct  (compile)  an  initial  CAP  machine  state  from  an  XTAL 
program.  Given  an  initial  XTAL  program,  we  need  the  following  (partial)  functions  or 
mappings  to  produce  the  CAP  state: 

-  Ac  '■  label  — >■  Word  -  a  layout  mapping  from  XTAL  code  heap  labels  to  CAP 
machine  addresses. 

-  Ad  '■  label  —*■  Word  -  a  layout  mapping  from  XTAL  data  heap  labels  to  CAP  ma¬ 
chine  addresses.  Both  the  domain  and  range  of  the  two  layout  functions  should 
be  disjoint.  We  use  A  without  any  subscript  to  indicate  the  union  of  the  two: 
A  =  Ac  U  Ad- 

-  £  :  Word  CmdList  x  Fred  -  the  external  (from  XTAL’s  point  of  view)  code 
blocks  and  their  CAP  preconditions  for  well-formedness.  Proving  that  these  blocks 
are  well-formed  according  to  the  preconditions  will  be  a  proof  obligation  when 
verifying  the  safety  of  the  complete  CAP  state.  The  range  of  Ac  may  overlap  with 
the  domain  of  £  -  these  addresses  are  the  implementation  of  XTAL  code  stubs. 

With  these  elements,  the  translation  from  XTAL  programs  to  CAP  is  quite  straight¬ 
forward.  As  in  [13],  we  can  describe  the  translation  by  a  set  of  relations  and  associated 
inference  rules.  Because  of  limited  space,  we  only  show  here  the  top-level  rule: 

A  L  (CyH)  =>  M  *4  L  R  =>■  ffi  Ac  P  I  =>•  C  Flatten(C,  M,pc) 

3l.C(l)  =  code  [r]./'  A  /  C.mil  I'  A  pc  =  Ac(l)  +  \I'\-\I\ 

Vw  e  Dom(£).  Flatten(Fst(£ (w)),  M,  w) 

£;Ah  (C,H,R,I)  =>  (M,R, pc)  (tr-prc 

Register  fi  les  and  word  values  translate  fairly  directly  between  XTAL  and  the  ma¬ 
chine.  XTAL  labels  are  translated  to  machine  addresses  using  the  A  functions.  Every 
heap  value  in  the  code  and  data  heaps  must  correspond  to  an  appropriately  translated 
sequence  of  words  in  memory.  All  XTAL  instructions  translate  directly  to  a  single  ma¬ 
chine  command  except  newpair  which  translates  to  a  series  of  commands  that  adjust 
the  allocation  pointer  to  make  space  for  a  new  pair  and  then  copy  the  initial  values 
from  rO  and  rl  into  the  new  space.  We  ignore  the  stubs  in  the  XTAL  code  heap  transla¬ 
tion  because  they  are  handled  in  the  top-level  translation  rule  shown  above  (when  £  is 
Flatten’ed). 

5.3  Generating  the  CAP  Proofs 

In  this  section  we  proceed  in  a  top-down  manner  by  fi  rst  stating  the  main  theorem  we 
wish  to  establish.  The  theorem  says  that  for  a  given  runtime  system,  any  well-typed 
XTAL  program  that  compiles  and  links  to  the  runtime  will  result  in  an  initial  machine 
state  that  is  well-formed  according  to  the  CAP  typing  rules.  Applying  Theorem  2,  we 
would  then  be  able  to  produce  an  FPCC  package  certifying  the  safety  of  the  initial 
machine  state. 

Theorem  6  (XTAL-CAP  Safety  Theorem).  For  some  specified  external  code  envi¬ 
ronment  £,  and  for  all  V  and  A,  if  F  V  (in  XTAL)  and  £;A  F  V  =>  S,  then  P  S  (in 
CAP). 


To  prove  that  the  CAP  state  is  well-formed  (using  the  (CAP-STATE)  rule.  Fig¬ 
ure  4),  we  need  a  code  heap  specifi  cation,  <I>,  and  a  top-level  precondition,  P,  for 
the  current  program  counter.  The  code  specifi  cation  is  generated  as  follows:  $  = 
CpGen (£,Ac,C),  where 


CpGen(£,*4c,C)(w) 

_  fCplnv(*4c,C,  r)  ifw  f  Dom(£)  and  31. Ac(l)  =  w  A  C(Z)  =  (code  [T]./) 
(Snd(£’(w))  ifw  €  Dom(£) 


That  is,  for  external  code  blocks,  the  precondition  comes  directly  from  £,  while 
for  code  blocks  that  have  been  compiled  from  XTAL,  the  CAP  preconditions  are  con¬ 
structed  by  the  following  defi  nition: 


Cplnv(.4cjC,  T)  =  AS.  P.  (PC)  A  A  (C;fhi?:T) 

A(.4P  (C,H)  =>  S.M)  A  (APP=»S.K) 


For  any  given  program,  the  code  heap  and  layout  (C  and  Ac)  must  be  unchanged, 
therefore  they  are  global  parameters  of  these  predicate  generators.  Cplnv  captures  the 
fact  that  at  a  particular  machine  state  there  is  a  well-typed  XTAL  memory  and  register 
fi  le  that  syntactically  corresponds  to  it.  We  only  need  to  specify  the  register  fi  le  type 
as  an  argument  to  Cplnv  because  the  typing  rules  for  the  well-formed  register  fi  le  and 
heap  will  imply  all  the  necessary  restrictions  on  the  data  heap  structure.  One  of  the 
main  insights  of  this  work  is  the  defi  nition  of  Cplnv,  which  allows  us  to  both  establish 
a  syntactic  invariant  on  CAP  machine  states  as  well  as  defi  ne  the  interface  between 
XTAL  and  library  code  at  the  CAP  level.  Cplnv  is  based  on  a  similar  idea  as  the  global 
invariant  defined  in  [13]  but  instead  of  a  generic,  monolithic  safety  proof  using  the 
syntactic  encoding  of  the  type  system,  Cplnv  makes  clear  what  the  program-specifi  c 
preconditions  are  for  each  command  (instruction)  and  allows  for  easy  manipulation  and 
reasoning  thereupon,  as  well  as  interaction  with  other  type  system-based  invariants. 

Returning  to  the  proof  of  Theorem  6,  if  we  defi  ne  the  top-level  precondition  of  the 
(CAP-STATE)  rule  to  be  Cplnv(*4(7,  C,  T),  then  it  is  trivially  satisfi  ed  on  the  initial  state 
§  by  the  premises  of  the  theorem.  We  now  have  to  show  well-formedness  of  the  code 
at  the  current  program  counter,  $  F  {P}  C  and,  in  fact,  proofs  of  the  same  judgment 
form  must  be  provided  for  each  of  the  code  blocks  in  the  heap,  according  to  the  (CAP- 
CDSPEC)  rule.  The  correctness  of  the  CAP  code  memory  is  shown  by  the  theorem: 


Theorem  7  (XTAL-CAP  Code  Heap  Safety).  For  a  specified  £,  and  for  any  XTAL 
program  state  (C ,  H ,  R,  I),  register  file  type  T,  layout  functions  A  and  machine  state 
(M,  K,pc),  such  that  P  (C,  H ,  R,  I)  and  £;Ah  (C,  H ,  R,  I)  =>■  (M,  M,pc),  if  $  = 
CpGen(£,  Ac,  C),  then  P  M  :  $. 

This  depends  in  turn  on  the  proof  that  each  well-typed  XTAL  instruction  sequence 
translated  to  machine  commands  will  be  well-formed  in  CAP  under  Cplnv: 


Theorem  8  (XTAL-CAP  Instruction  Safety).  For  a  specified  £,  and  for  all  Ac,  C, 
I,  r,  and  C  (where  $  =  CpGen(£ ,  Ac,C)),  if  C',T  P I  and  Ac  P  I  =>  C  then 
$  P  {Cplnv(^c,C,r)}C. 


Due  to  space  constraints,  we  omit  details  of  the  proof  of  this  theorem  except  to 
mention  that  it  is  proved  by  induction  on  I.  In  cases  where  the  current  instruction  di¬ 
rectly  maps  to  a  machine  command  (i.e.,  other  than  newpair),  the  postcondition  (Q  in 
the  CAP  rules)  is  generated  by  applying  Cplnv  to  the  updated  XTAL  register  fi  le  type. 
We  use  the  XTAL  safety  theorems  (4  and  5)  here  to  show  that  Q  holds  after  one  step 
of  execution.  In  the  case  of  the  expanded  commands  of  newpair,  we  must  construct  the 
intermediate  postconditions  by  hand  and  then  show  that  Cplnv  is  re-established  on  the 
state  after  the  sequence  of  expanded  commands  has  been  completed.  In  the  case  when 
jumping  to  external  code,  we  use  the  result  of  Proof  Obligation  10  below. 

Finally,  establishing  the  theorems  above  depends  on  satisfying  some  proof  obliga¬ 
tions  with  respect  to  the  external  library  code  and  its  interfaces  as  specifi  ed  at  the  XTAL 
level.  First,  we  must  show  that  the  external  library  code  is  well-formed  according  to  its 
supplied  preconditions: 

Proof  Obligation  9  (External  Code  Safety)  For  a  given  £,  if  $  =  CpGenff,  Arj-C) 
for  any  Ac  and  C,  then  $  P  {Snd(£(w))}  Fst(f(w)),  for  all  w  £  Dom(£). 

For  now,  we  assume  that  the  proofs  of  this  lemma  are  constructed  “by  hand”  using 
the  rules  for  well-formedness  of  CAP  commands. 

Secondly,  when  linking  the  external  code  with  a  particular  XTAL  program,  where 
certain  labels  of  the  XTAL  code  heap  are  mapped  to  external  code  addresses,  we  have  to 
show  that  the  typing  environment  that  would  hold  at  any  XTAL  program  that  is  jumping 
to  that  label  implies  the  actual  precondition  of  that  external  code: 

Proof  Obligation  10  (Interface  Correctness)  For  a  given  £,  Ac>  and  C,  and  for  all  l 

such  thatC(l)  =  stub  [T].0  and  Ac(l )  =  w,  if  Cplnv( Ac, C,  T)(S)then  Snd(£(u;))(!§). 

These  properties  must  be  proved  for  each  instantiation  of  the  runtime  system  £ . 
With  them,  the  proofs  of  Theorems  8,  7,  and,  fi  nally,  6  can  be  completed. 

5.4  arrayget  Example 

As  a  concrete  example  of  the  process  discussed  in  the  foregoing  subsection,  let  us  con¬ 
sider  arrayget.  The  XTAL  type  interface  is  deli  ned  in  Section  4.3.  An  implementation 
of  this  function  could  be: 

Cage,  =  [id  r8,  r0(0);  addirl,rl,  1;  bgt  rl,  r8,  bnderr;  addrO,  rO,  rl;  ldrO,  r0(0);  jmpr7] 

The  runtime  representation  of  an  array  in  memory  is  a  length  fi  eld  followed  by  the 
actual  array  of  data.  We  assume  that  there  is  some  exception  handling  routine  for  out-of- 
bounds  accesses  with  a  trivial  precondition  defi  ned  by  £  (bnderr)  =  (C bnderr,  Q  bnderr)  ■ 
Before  describing  the  CAP  assertions  for  the  safety  of  Cage, ,  notice  that  the  code 
returns  indirectly  to  an  XTAL  function  pointer.  Similarly,  the  arrayget  address  can  be 
passed  around  in  XTAL  programs  as  a  fi  rst-class  code  pointer.  While  the  syntactic  type 
system  handles  these  code  pointers  quite  easily  using  the  relevant  XTAL  types,  deal¬ 
ing  with  code  pointers  in  a  Hoare  logic-based  setup  like  CAP  requires  a  little  bit  of 
machinery. 

We  can  thus  proceed  to  directly  defi  ne  the  precondition  of  <Q,get  as. 


Qage,  =  Cplnv(.4c,C,  {  rO:  array,  rl :  int,  r7:  (V[{rO:int}])  }) 


for  some  Ac  and  C.  Then  we  certify  the  library  code  in  CAP  by  providing  a  derivation 
of  ($  b  {Qagei}  Paget)  ■  We  do  this  by  applying  the  appropriate  rules  from  Figure  4 
to  track  the  changes  that  are  made  to  the  state  with  each  command.  When  we  reach 
the  fi  nal  jump  to  r7,  we  can  then  show  that  Cplnv(bb,  C,  {rO  :  int})  holds,  which  must 
be  the  precondition  specifi  ed  for  the  return  code  pointer  by  <I>(S  ,M(r7))  (see  the  deli  ni- 
tion  of  $  in  the  beginning  of  Section  5.3).  The  problem  with  this  method  of  certifying 
arrayget,  however,  is  that  we  have  explicitly  included  details  about  the  source  language 
type  system  in  its  preconditions.  In  order  to  make  the  proof  more  generic,  while  at  the 
same  time  be  able  to  leverage  the  syntactic  type  system  for  certifying  code  pointers, 
we  follow  a  similar  approach  as  in  [27]:  First,  we  deli  ne  generic  predicates  for  the  pre- 
and  postconditions,  abstracting  over  an  arbitrary  external  predicate,  Paget.  The  actual  re¬ 
quirements  of  the  arrayget  code  are  minimal  (for  example,  that  the  memory  area  of  the 
array  is  readable  according  to  the  safety  policy).  The  post-condition  predicate  relates 
the  state  of  the  machine  upon  exiting  the  code  block  to  the  initial  entry  state: 

Pre  =  \Page,.\S.  Paget( S)  A  SafeToRead(S.M,  S.R(rO),  S.K(rl)  +  l) 

Post  =  A(M,  R,  pc).  A(M' ,  R '  ,pc').  M'  =  M  A  pc'  —  S.R(r7) 

AR'(rO)  =  M(R(rO)+R(rl)  +  l)  A  ... 

Now  we  certify  the  arrayget  code  block,  quantifying  over  all  Paget  and  complete 
code  specifi  cations  <I».  but  imposing  some  appropriate  restrictions  on  them: 

$(bnderr)  =  Qbnderr  A  (VS,S'.Pr e(Pagel)(S)  A  Post(S)(S')  4>(S.R(r7))(S')) 

— >■  <f>  F  {  Pre  (Paget)}  Paget 

Thus,  under  the  assumption  that  the  Pre  predicate  holds,  we  can  again  apply  the 
inference  rules  for  CAP  commands  to  show  the  well-formedness  of  the  C aget  code. 
When  we  reach  the  fi  nal  jump,  we  show  that  the  Post  predicate  holds  and  then  use  that 
fact  with  the  premise  of  the  formula  above  to  show  that  it  is  safe  to  jump  to  the  return 
code  pointer. 

The  arrayget  code  can  thus  be  certifi  ed  independent  of  any  type  system,  by  introduc¬ 
ing  the  quantifi  ed  Paget  predicate.  Now,  when  we  want  to  use  this  as  an  external  function 
forXTAL  programs,  we  instantiate  Paget  with  Qaget  above.  We  have  to  prove  the  premise 
of  the  formula  above,  (V§ ,  S'.Pre(Qflg(;f)(S)  A  Post(S)(S')  — >  $(S.M(r7))(S')).  Prov¬ 
ing  this  is  not  diffi  cult,  because  we  use  properties  of  the  XTAL  type  system  to  show 
that  from  a  state  satisfying  the  precondition-!,  e.  there  is  a  well-formed  XTAL  program 
whose  register  fi  le  satisfi  es  the  arrayget  type  interface-  the  changes  described  by  the 
Post  predicate  will  result  in  a  state  to  which  there  does  correspond  another  well-formed 
XTAL  program,  one  where  the  register  rO  is  updated  with  the  appropriate  element  of  the 
array.  Then  we  can  let  f(arrayget)  =  (C aget,  Pr e(Qaget))  and  we  have  satisfi  ed  Proof 
Obligation  9.  Proof  Obligation  10  follows  almost  directly  given  our  defi  nition  of  Qaget- 

In  summary,  we  have  shown  how  to  certify  runtime  library  code  independent  of  a 
source  language.  In  order  to  handle  code  pointers,  we  simply  assume  their  safety  as  a 
premise;  then,  when  using  the  library  with  a  particular  source  language  type  system,  we 


instantiate  with  a  syntactic  well-formedness  predicate  in  the  form  of  Cplnv  and  use  the 
facilities  of  the  type  system  for  checking  code  pointers  to  prove  the  safety  of  indirect 
jumps. 

6  Implementation  and  Future  Work 

We  have  a  prototype  implementation  of  the  system  presented  in  this  paper,  developed 
using  the  Coq  proof  assistant.  Due  to  space  constraints,  we  have  left  out  its  details 
here.  As  mentioned  earlier  in  the  paper,  our  eventual  goal  is  to  build  an  FPCC  system 
for  real  IA-32  (Intel  x86)  machines.  We  have  already  applied  the  CAP  type  system 
to  that  architecture  and  will  now  need  to  develop  a  more  realistic  version  of  XTAL. 
Additionally,  our  experience  with  the  Coq  proof  assistant  leads  us  to  believe  that  there 
should  be  more  development  on  enhancing  the  automation  of  the  proof  tactics,  because 
many  parts  of  the  proofs  needed  for  this  paper  are  not  hard  or  complex,  but  tedious  to 
do  given  the  rather  simplistic  tactics  supplied  with  the  base  Coq  system. 

In  this  paper,  we  have  implicitly  assumed  that  the  CAP  machine  code  is  generated 
from  one  of  two  sources:  (a)  XTAL  source  code,  or  (b)  code  written  directly  in  CAP. 
However,  more  generally,  our  intention  is  to  support  code  from  multiple  source  type 
systems.  In  this  case,  the  deli  nition  of  CpGen  (Section  5.3)  would  utilitize  code  precon¬ 
dition  invariant  generators  (Cplnv)  from  the  multiple  type  systems.  The  general  form  of 
each  Cplnv  would  be  the  same,  although,  of  course,  the  particular  typing  environments 
and  judgments  would  be  different  for  each  system.  Then  we  would  have  a  series  of  the¬ 
orems  like  those  in  Section  5.3,  specialized  for  each  Cplnv.  Proof  Obligation  10  would 
also  be  generalized  as  necessary,  requiring  proofs  that  the  interfaces  between  the  vari¬ 
ous  type  systems  are  compatible.  Of  course  there  will  be  some  amount  of  engineering 
required  to  get  such  a  system  up  and  running,  but  we  believe  that  there  is  true  potential 
for  building  a  realistic,  scalable  FPCC  framework  along  these  lines. 

7  Related  Work  and  Conclusion 

In  the  context  of  the  original  PCC  systems  cited  in  the  Introduction,  there  has  been 
recent  work  to  improve  their  fexibility  and  reliability  by  removing  type-system  specifi  c 
components  from  the  framework  [19].  These  systems  have  the  advantage  of  working, 
production-quality  implementations  but  it  is  still  unclear  whether  they  can  approach  the 
trustworthiness  goals  of  FPCC. 

We  also  mentioned  the  fi  rst  approaches  to  generating  FPCC,  which  utilized  seman¬ 
tic  models  of  the  source  type  system,  and  their  resulting  complexities.  Attempting  to 
address  and  hide  the  complexity  of  the  semantic  soundness  proofs,  Juan  Chen  et  al.  [6] 
have  developed  LTAL,  a  low-level  typed  assembly  language  which  is  used  to  compile 
core  ML  to  FPCC.  LTAL  is  based  in  turn  upon  an  abstraction  layer,  TML  (typed  ma¬ 
chine  language)  [22],  which  is  an  even  lower-level  intermediate  language.  Complex 
parts  of  the  semantic  proofs,  such  as  the  indexed  model  of  recursive  types  and  strati- 
fi  ed  model  of  mutable  fi  elds,  are  hidden  in  the  soundness  proof  of  TML  and  as  long 
as  a  typed  assembly  language  can  be  compiled  to  TML,  one  need  not  worry  about  the 
semantic  models.  All  the  same,  LTAL  and  TML  are  only  assembly  language  type  sys¬ 
tems,  albeit  at  a  much  lower  level  that  XTAL.  They  do  not  provide  CAP’S  generality 


of  reasoning  nor  can  their  type  systems  be  used  to  certify  their  own  runtime  system 
components.  It  should  be  clearly  noted  that  the  ideas  presented  in  this  paper  are  not  re¬ 
stricted  to  use  with  a  syntactic  FPCC  approach,  as  we  have  pursued.  Integrating  LTAL 
or  TML  with  the  CAP  framework  of  this  paper  to  certify  their  runtime  system  compo¬ 
nents  seems  feasible  as  well. 

Along  the  syntactic  approach  to  FPCC,  Crary  [9, 10]  applied  our  methods  [13, 14] 
to  a  realistic  typed  assembly  language  initially  targeted  to  the  Intel  x86.  He  even  went 
on  to  specify  invariants  about  the  garbage  collector  interface,  but  beyond  the  interface 
the  implementation  is  still  uncertifi  ed.  In  his  work  he  uses  the  metalogical  framework 
of  Twelf  [21]  instead  of  the  CiC-based  Coq  that  we  have  been  using. 

In  conclusion,  there  is  much  ongoing  development  of  PCC  technology  for  producing 
certifi  ed  machine  code  from  high-level  source  languages.  Concurrently,  there  is  exciting 
work  on  certifying  garbage  collectors  and  other  low-level  system  libraries.  However, 
integrating  the  high  and  low-level  proofs  of  safety  has  not  yet  received  much  attention. 
The  ideas  presented  in  this  paper  represent  a  viable  approach  to  dealing  with  the  issue 
of  interfacing  and  integrating  safety  proofs  of  machine  code  from  multiple  sources  in  a 
fully  certifi  ed  framework. 
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